Russian Monitor Corpora: Composition, Linguistic Encoding and Internet Publication

نویسنده

  • Serge A. Yablonsky
چکیده

5XVVLDQ PRQLWRU FRUSRUD VHHNV WR UHIOHFW WKH FXUUHQW VWDWXV RI 5XVVLDQ DQG FRQWDLQV WRGD\ PLOOLRQ ZRUGV DQG ZLOO EH QHYHU FRPSOHWH EHFDXVH OLNH ODQJXDJH LWVHOI LW LV DOZD\V GHYHORSLQJ 6RPH QHZ H[DPSOHV RI ODQJXDJH DUH EHLQJ DGGHG ZKLOH RWKHU WH[WV DUH GHOHWHG WR HQVXUH WKDW WKH FRUSXV UHSUHVHQWV WKH FXUUHQW VWDWH RI D ODQJXDJH 3URJUHVV LQ 5XVVLDQ ODQJXDJH SURFHVVLQJ DIIRUGV DQ RSSRUWXQLW\ IRU DSSO\LQJ LWV UHVXOWV IRU FUHDWLQJ 5XVVLDQ PRQLWRU FRUSRUD VWURQJO\ FRQQHFWHG ZLWK WKH VHW RI HOHFWURQLF GLFWLRQDULHV E\ WKH KHOS RI OLQJXLVWLF VRIWZDUH 2XU DSSURDFK LV SDUWLFXODUO\ GHSHQGHQW RQ PRQLWRULQJ RI 5XVVLDQ UHVRXUFHV SXEOLVKHG LQ ,QWHUQHW DQG RQ &' ODQJXDJH SURFHVVRU 5XVVLFRQ DQG ZLGH XVDJH RI 5XVVLFRQ HOHFWURQLF GLFWLRQDULHV 3LORW FRUSXV TXHU\ V\VWHP IRU -DYD LQ LWV ,QWHUQHW YHUVLRQ DOORZV • WR XVH D VHOHFWHG VXEFRUSXV RU VXEFRUSRUD RU WKH ZKROH FRUSXV • WR VHDUFK D ZRUG LQ LWV SDUWLFXODU IRUP RU D ZKROH SDUDGLJP • WR FKDQJH WKH OHQJWK RI WKH FRQWH[W IURP RQH OLQH E\ GHIDXOW WR PRUH OLQHV

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of Russian Language Resources

In this paper we describe the creation of large scale linguistic resources for Russian language. Internet/intranet system architecture was developed to make a large volume of Russian language lexical information, corpora (texts) and knowledge base (Russian WordNet) available to the system at development and/or run time. There are four linguistic counterparts, corresponding to the major categori...

متن کامل

Syntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity

In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...

متن کامل

Encoding Linguistic Corpora

This paper describes the motivation and design of the Corpus Encoding Standard (CES) (Ide, et al., (1996); Ide, 1998), an encoding standard for linguistic corpora intended to meet the need for the development of standardized encoding practices for linguistic corpora. The CES identifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive repre...

متن کامل

Collection, Annotation and Analysis of Gold Standard Corpora for Knowledge-Rich Context Extraction in Russian and German

This paper describes the collection, annotation and linguistic analysis of a gold standard for knowledge-rich context extraction on the basis of Russian and German web corpora as part of ongoing PhD thesis work. In the following sections, the concept of knowledge-rich contexts is refined and gold standard creation is described. Linguistic analyses of the gold standard data and their results are...

متن کامل

Design and Data Collection for the Accentological Corpus of the Russian Language

Accentological corpus provides a researcher an opportunity to study word stress and stress variation, which are very important for the Russian language. Moreover, Accentological corpus allows studying the history of the Russian language stress development. The research presents the main characteristics of Accentological corpus available at ruscorpora.ru. Corpora size, type and sources of text m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000